match function
Mitigating LLM Hallucinations via Conformal Abstention
Yadkori, Yasin Abbasi, Kuzborskij, Ilja, Stutz, David, György, András, Fisch, Adam, Doucet, Arnaud, Beloshapka, Iuliya, Weng, Wei-Hung, Yang, Yao-Yuan, Szepesvári, Csaba, Cemgil, Ali Taylan, Tomasev, Nenad
We develop a principled procedure for determining when a large language model (LLM) should abstain from responding (e.g., by saying "I don't know") in a general domain, instead of resorting to possibly "hallucinating" a non-sensical or incorrect answer. Building on earlier approaches that use self-consistency as a more reliable measure of model confidence, we propose using the LLM itself to self-evaluate the similarity between each of its sampled responses for a given query. We then further leverage conformal prediction techniques to develop an abstention procedure that benefits from rigorous theoretical guarantees on the hallucination rate (error rate). Experimentally, our resulting conformal abstention method reliably bounds the hallucination rate on various closed-book, open-domain generative question answering datasets, while also maintaining a significantly less conservative abstention rate on a dataset with long responses (Temporal Sequences) compared to baselines using log-probability scores to quantify uncertainty, while achieveing comparable performance on a dataset with short answers (TriviaQA). To evaluate the experiments automatically, one needs to determine if two responses are equivalent given a question. Following standard practice, we use a thresholded similarity function to determine if two responses match, but also provide a method for calibrating the threshold based on conformal prediction, with theoretical guarantees on the accuracy of the match prediction, which might be of independent interest.
A Survey of Adaptive Resonance Theory Neural Network Models for Engineering Applications
da Silva, Leonardo Enzo Brito, Elnabarawy, Islam, Wunsch, Donald C. II
This survey samples from the ever-growing family of adaptive resonance theory (ART) neural network models used to perform the three primary machine learning modalities, namely, unsupervised, supervised and reinforcement learning. It comprises a representative list from classic to modern ART models, thereby painting a general picture of the architectures developed by researchers over the past 30 years. The learning dynamics of these ART models are briefly described, and their distinctive characteristics such as code representation, long-term memory and corresponding geometric interpretation are discussed. Useful engineering properties of ART (speed, configurability, explainability, parallelization and hardware implementation) are examined along with current challenges. Finally, a compilation of online software libraries is provided. It is expected that this overview will be helpful to new and seasoned ART researchers.
Distributed dual vigilance fuzzy adaptive resonance theory learns online, retrieves arbitrarily-shaped clusters, and mitigates order dependence
da Silva, Leonardo Enzo Brito, Elnabarawy, Islam, Wunsch, Donald C. II
This paper presents a novel adaptive resonance theory (ART)-based modular architecture for unsupervised learning, namely the distributed dual vigilance fuzzy ART (DDVFA). DDVFA consists of a global ART system whose nodes are local fuzzy ART modules. It is equipped with the distinctive features of distributed higher-order activation and match functions, using dual vigilance parameters responsible for cluster similarity and data quantization. Together, these allow DDVFA to perform unsupervised modularization, create multi-prototype clustering representations, retrieve arbitrarily-shaped clusters, and control its compactness. Another important contribution is the reduction of order-dependence, an issue that affects any agglomerative clustering method. This paper demonstrates two approaches for mitigating order-dependence: preprocessing using visual assessment of cluster tendency (VAT) or postprocessing using a novel Merge ART module. The former is suitable for batch processing, whereas the latter can be used in online learning. Experimental results in the online learning mode carried out on 30 benchmark data sets show that DDVFA cascaded with Merge ART statistically outperformed the best other ART-based systems when samples were randomly presented. Conversely, they were found to be statistically equivalent in the offline mode when samples were pre-processed using VAT. Remarkably, performance comparisons to non-ART-based clustering algorithms show that DDVFA (which learns incrementally) was also statistically equivalent to the non-incremental (offline) methods of DBSCAN, single linkage hierarchical agglomerative clustering (HAC), and k-means, while retaining the appealing properties of ART. Links to the source code and data are provided. Considering the algorithm's simplicity, online learning capability, and performance, it is an ideal choice for many agglomerative clustering applications.
Performance Bounds for Pairwise Entity Resolution
Barnes, Matt, Miller, Kyle, Dubrawski, Artur
One significant challenge to scaling entity resolution algorithms to massive datasets is understanding how performance changes after moving beyond the realm of small, manually labeled reference datasets. Unlike traditional machine learning tasks, when an entity resolution algorithm performs well on small hold-out datasets, there is no guarantee this performance holds on larger hold-out datasets. We prove simple bounding properties between the performance of a match function on a small validation set and the performance of a pairwise entity resolution algorithm on arbitrarily sized datasets. Thus, our approach enables optimization of pairwise entity resolution algorithms for large datasets, using a small set of labeled data.
OPS, a domain-independent production system language
Abstract: It has been claimed that production systems have several advantages over other representational schemes. These include the potential for general self-augmentation (i.e., learning of new behavior) and the ability to function in complex environments. The production system language, OPS, was implemented to test these claims. In this paper we explore some of the issues that bear on the design of production system languages and try to show the adequacy of OPS for its intended purpose. I. INTRODUCTION Much of the work that has been done with production systems during the past few years has had as its primary goal the development of systems that are expert in some particular task. The tasks so far addressed include: chemical inference [Buchanan and Lederberg, J 971], medical diagnosis [Davis, Buchanan, and Shortliffe, 1975], discovery in mathematics [Lenat, 1976], speech recognition [Erman and Lesser, 1975; McCracken, 1977], and automatic programming [Barstow, 1977]. Although many of these systems have shown impressive power in the particular task for which they were designed, there remains a question of how suitable the production system representation is for large general problem solving programs. The Instructable Production System (IPS) project at CMU [Rychener and Newell, 1977] is attempting to answer this question. It has been claimed that production systems are capable of learning in a nontrivial way. If this is true, a production system should be able to learn not only facts, but also new behaviors.